Estimating Reference Scopes of Wikipedia Article Inner-links
نویسندگان
چکیده
Wikipedia is the largest online encyclopedia, and utilized as machine-knowledgeable and semantic resources. Links within Wikipedia indicate that two articles or parts of them related about their topics. Existing link detection methods focus on article titles because most of links in Wikipedia point to article titles. But there are a number of links in Wikipedia pointing to corresponding segments, because the whole article is too general and it is hard for readers to obtain the intention of the link. We propose a method to automatically predict whether the link target is a specific segment and provide which segment is most relevant. We propose a combination method of Latent Dirichlet Allocation (LDA) and Maximum Likelihood Estimation (MLE) to represent every segment as a vector, and then we obtain similarity of each segment pair. Finally we utilize variance, standard deviation and other statistical features to predict the results. We also try Word2Vector model to embed all the segments into a semantic space and calculate cosine similarities between segment pairs, then we utilize Random Forest to train a classifier to predict link scopes. Through evaluations on Wikipedia articles, our method achieved reasonable results. Keyword Wikipedia, link suggestion, LDA, word2vector, PMI
منابع مشابه
Sense and Reference Disambiguation in Wikipedia
Wikipedia articles are annotated by volunteer contributors with numerous links that connect words and phrases to relevant titles in Wikipedia. In this paper, we identify inconsistencies in the user annotation of links and show that they can have a substantial impact on the performance of word sense disambiguation systems that are trained on Wikipedia links. We describe two major types of link a...
متن کاملFinding titles representing segments of Wikipedia Articles from keyphrases
Wikipedia is a free online encyclopedia that aims to allow anyone to edit any article or create them. However, articles tend to become long and complex, so giving appropriate titles or key phrases to untitled segments is necessary for reader assistance. In this paper, we show methods to select titles for representing article segments. Key phrase extraction has been studied for years, but we con...
متن کاملBoosting Cross-Lingual Knowledge Linking via Concept Annotation
Automatically discovering cross-lingual links (CLs) between wikis can largely enrich the cross-lingual knowledge and facilitate knowledge sharing across different languages. In most existing approaches for cross-lingual knowledge linking, the seed CLs and the inner link structures are two important factors for finding new CLs. When there are insufficient seed CLs and inner links, discovering ne...
متن کاملWikiwhere: An interactive tool for studying the geographical provenance of Wikipedia references
Wikipedia articles about the same topic in different language editions are built around different sources of information. For example, one can find very different news articles linked as references in the English Wikipedia article titled “Annexation of Crimea by the Russian Federation” than in its German counterpart (determined via Wikipedia’s language links). Some of this difference can of cou...
متن کاملThe Task of Automatic Documents Clustering
In this paper we describe a new unsupervised algorithm for automatic documents clustering with the aid of Wikipedia. Contrary to other related algorithms in the field, our algorithm utilizes only two aspects of Wikipedia, namely its categories network and articles titles. We do not utilize the inner content of the articles in Wikipedia or their inner or inter links. The implemented algorithm wa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017